2010-01-27:

The tale of Syndicate Wars Port

hard:reverse engineering:re:assembler:games:gamedev:x86:asm:windows:linux:macosx:c:syndicate wars
As promised, It's time to reveal the technical story behind the Syndicate Wars Port. The story is divided into two parts - the first, and the second attempt to port this game. Comments are welcomed!

[UPDATE: Video from Recon conference in 2010: "Syndicate Wars Port: How to port a DOS game to modern systems"]

The initial attempt


We made the first attempt to port this game a few years back (I think it was 5 years ago). The plan was simple - create a disassembler, and try to find all the dependencies. Sounds simple!
OK, first of all, why the hell did we want to create a disassembler almost from scratch? Two reasons:
1.We didn't know any disassembler that handled LE files correctly (we didn't know about IDA back then). For those of you who never played with the DOS4GW extender - the executable file consists (similarly to a PE file) of two parts: a 16-bit DOS stub that executes dos4gw.exe, and a 32-bit Linear Executable that is loaded by the dos4gw.exe loader. Of course the LE part contains the application code/data/etc. (check http://www.tenberry.com/dos4g/faq/format.html for additional information)
2.Since the dead-listing was going to be recompiled, it had to be compatible with the input format of an assembler of our choosing (and we chose the Netwide Asembler aka NASM).
Additionally, your own disassembler allows you to incorporate some other, useful in the given case, features, like a user-provided symbol table, a white list of data regions (with the not listed data regions not appearing in the listing), or a list of vtable regions (first we thought that the game was in C++, having mistaken a switch-jump table for a vtable :>).
The disassembler (called ledisasm) was written by Unavowed, and it used the ndisasm (from the Netwide Assembler packet) as the disassembling engine.

Ah, one thing - Unavowed is a GNU/Linux person, while I owned a Windows box, so we had to write everything in a way that it would work on both our systems (which restricted as to create only console applications). Back to the story...

Having a disassembler ready (it's a simplification - Unavowed made fixed to it from time to time, so we had to use a diff-patch (+ some sed scripts) method to keep the changes between the dead-listing regenerations), we could start looking for the dependencies: the C library functions, the I/O calls (mostly int/out/in instructions or mapped memory IO references), and DOS4GW environment specific dependencies.

But first! Some statistics: the listing we got weighted over 14 MB, and consisted of over 1,070,000 lines of assembly code and data (dd db etc).

OK, but how to find a, let's say, an open/fopen function in such insanely large assembly-soup? Look for interrupts and trace the cross-references! Of course, we used the best existing interrupt list - the Ralf Brown's Interrups List (http://www.ctyme.com/rbrown.htm).

One might say "Hey! Wait a minute there! How come the 32-bit application was allowed to use the 16-bit DOS/VBIOS/BIOS/etc interrupts?". It is a good question, and it is a place where the DPMI enters (DOS Protected Mode Interface). In short, the DPMI, which is integrated in DOS4GW, registers some ISRs (Interrupt Service Routine) in the IDT (Interrupt Descriptor Table), which, when called, switch to 16-bit real mode (I'm not sure here whether DOS4GW implements it using the VM86 method, or normal real mode), calls whatever it's supposed to call, and jumps back to the protected mode.
Additionally, there were functions like int386 or int386x which allowed to call the 16-bit interrupts using the DPMI from high level languages like C.

Btw, the int386x is implemented in a funny way (however I admit that there are not many ways to do it, and this one probably is the fastest):

____int386x_:
[...]
 call func_02628
[...]
func_02628:
 lea esi,[esi+esi*2]
 lea eax,[cs:esi+func_02631]
 push eax
; push the return address
[...]
 ret

func_02631:
 int 0x0
 ret
 int 0x1
 ret
 int 0x2
 ret
 int3
 nop
 ret
 int 0x4
 ret
 int 0x5
[...] ; yep, there is an int XX + ret for every interrupt
 int 0xfc
 ret
 int 0xfd
 ret
 int 0xfe
 ret
 int 0xff
 ret


OK, back to the topic!
After finding a few functions, we found somewhere reference to Watcom, and found the Open Watcom compiler (http://www.openwatcom.org/), with the source code of the standard libraries. Searching for functions when you have their source is much faster than when you have nothing (the sources were accurate to about 95%). Additionally, we could confirm our findings, and also change some names from __i_think_its_fopen_but_im_not_sure_please_double_check to _freopen :)

While reverse engineering the assembly code (we translated the code to pseudo-C, since it's easier to read C than assembly), we created some tools (in Perl) which helped us in these translations: it was a very simple “decompilation” of some instructions and if blocks; it was buggy, but it was enough to speed things a little (it was really simple, nothing even close to what modern hexrays can do). Also, I had a script in cli PHP that changed the function names to color names (it's easier to remember and distinguish func_red and loc_cyan than func_189275 and loc_9ac61b), but in the end, we didn't use it too much.

When we found about 50% functions, we met a guy (hi joostp!) who was working on a remake of the first Syndicate. After we told him what we were doing, he showed mercy and gave us a list of functions found by IDA Pro in MAIN.EXE (the Syndicate Wars executable), which saved us a few weeks of finding the rest of the functions.

Having the functions, we could cut exchange the function implementations to calls to the modern native libc (glibc or msvcrt). However, the calls couldn't been done with a simple 'jmp libc.func' since Watcom uses a Watcom fastcall calling convention (take a look at the “Calling conventions for different C++ compilers and operating systems” by Anger Fog), which, of course, is not compatible with cdecl used in both glibc and msvcrt. So, we created a Python script that received a list of functions with some kind of prototype descriptors, and created the wrappers. Additionally, the script handled the win32/gnu differences (like the underscore required in cdecl functions in object files on Windows) and added debug-aiding messages.

The configuration file for the wrapper.py script looked like this:
# v - vararg: like cdecl but used for functions with v[name] variant
#
# args is a sequence of zero or more of:
# i - int
# x - int (displayed in hex)
# p - void * (general pointer)
# s - char *
# c - char
#
# name type args
access p sx
asctime p
atoi p s
[...]

A sample wrapper looks like this:
_c_access:
       push ebx
       push ecx
       push edx
       push esi
       push edi
       push edx
       push eax
       push edx
       push eax
       push dword .debug_str
       call printf
       add esp, byte +0xc
       call access
       add esp, byte +0x8
       pop edi
       pop esi
       pop edx
       pop ecx
       pop ebx
       ret
.debug_str:
       db 'access("%s", 0x%x)', 0xa, 0x0


You may be surprised about the push/pop edx and ecx in the above code, since normally the callee should save only ebx, esi, edi and ebp registers, and both the edx and ecx registers are considered to be scratch registers. Well, guess what, in Watcom clib (clib, libc, crt, geeez, these people should make up their minds!) both edx and ecx are callee-save registers. Believe me, we learned this the hard way ;p

About the debug messages, they of course were printed to stdout, and at one point we added also printing the return address to stdout, and we hooked it with a tool written in C which had a symbol map of the functions (as in “had a converted objdump symbol table into a hash table cached on the hard disk between runs symbol table” to be exact), and switched the addresses in the debug output to symbols.

The input looked like this:
004DAAA5 read(3, 0096F95C, 1024)
004DA9FD close(3)
004271EB strcmp("CD", "CD")
004271EB strcmp("InstallDrive", "CD")
004271EB strcmp("InstallDrive", "InstallDrive")
004271EB strcmp("Language", "CD")

and the output:
<func_02046+1d> read(3, 0096F95C, 1024)
<func_02044+11> close(3)
<func_00268+91/jump_02574+12> strcmp("CD", "CD")
<func_00268+91/jump_02574+12> strcmp("InstallDrive", "CD")
<func_00268+91/jump_02574+12> strcmp("InstallDrive", "InstallDrive")
<func_00268+91/jump_02574+12> strcmp("Language", "CD")


Ah, speaking of debugging – we used the GNU Debugger (gdb) mainly, since it was the only debugger both me and Unavowed could use. To speed up things a little, we created some GDB scripts that made it usable a little more. E.g.:
define hardtrace
 echo Hardtracing the stack...\n
 set $max = 0x00520000
 set $min = 0x00400000
 set $cnt = $esp
 set $iter = 1
 printf "[00] "
 info symb $eip
 while 1
   set $temp = *(unsigned int*)$cnt & 0xffff0000
   if $temp >= $min && $temp <= $max
     printf "[%.2i] ", $iter
     set $iter = $iter + 1
     info symb *(unsigned int*)$cnt
   end
   set $cnt = $cnt + 4
 end
end

(yes, this script is a brute-force call-stack walker)

After the standard C functions started working, the next step was to see what is the first thing that crashes, analyze it, fix it, and do the same with the next place the game will crash (please note that at that moment we had nothing more than a few debug message showing on the console). Of course, since the C functions worked, the things that crashed were the I/O functions.

After some time, we managed to block (block, not fix) the I/O functions of the keyboard, sound, and mouse, and we focused on the graphic routines.

Some, like the palette changing, were easy to find, since they used known port numbers - out/in instructions were the key here, and Ralf Browns Port List (yes, Ralf Browns XYZ List again). For example, the palette changing function looks like this:

;------------------------------------------------------
func_00889:             ; 0006f9dc
;------------------------------------------------------
               push ecx
               mov ch,dl
               mov cl,al
               mov dx,0x3c8
               xor al,al
               out dx,al ; palette color number
               mov dl,0xc9
               mov al,cl
               out dx,al ; red
               mov al,ch
               out dx,al ; green
               mov al,bl
               out dx,al ; blue
               [...]
               pop ecx

               ret


The above function was translated into SDL-compatible palette changing (in C of course):
void
set_palette(const uint8_t *palette)
{
 SDL_Color colors[256];
 int x;
 const uint8_t *p;

 printf("set_palette(%p)\n", palette);

 for (p = palette, x = 0; x < 256; x++, p += 3)
 {
   colors[x].r = p[0] * 4;
   colors[x].g = p[1] * 4;
   colors[x].b = p[2] * 4;
   printf("[ %i %i %i ], ", colors[x].r,colors[x].g,colors[x].b );
 }
 
 if (SDL_SetPalette(screen, SDL_LOGPAL | SDL_PHYSPAL, colors, 0, 256) != 1)
   fprintf(stderr, "set_palette() failed\n");
}


And the same had to be done with resolution changing, on screen rendering, etc.

Finally, after some days, we saw the intro! Kinda... (the colors were OK)
intro... kinda


That evening we worked until we finally got the proper intro showing up, playing at hi-speed (the timers were just stubs at that time), and finally crashing :)

The last thing we did in this attempt, was displaying the menu. After that Unavowed moved abroad, got offline, and the project was forgotten, and waited for its own time....

The final attempt


You know how it is – when you got that far, something might be forgotten, but it will pop up from time to time.

About two years ago the project was revived, but we decided to start from scratch – we learned a few things here and there, gained some experience, and we thought that it could be done better.

So, Unavowed created a new disassembler, this time based on the binutils package, that created re-compilable listings in GNU Assembler (as) format, AT&T style this time (he he he I can see some of you going 'at&t??? omg blah yuck'). The first disassembler did not give us any guarantee that its output could be correctly recompiled, because it walked linearly through the bytes in the image, so the new version traced all branching instructions to map out all reachable code. Also the wrapper creating script and the C part of the port were rewritten.

Having notes from the previous attempts, we got to the same point very fast, and keyboard, mouse, and timers became our focus.
Well, I guess I can't tell you nothing new here – the previously used method was good enough, and in a few days work we got the first level of the game running.
Also, Unavowed insisted on playing the game music from ogg files, and that also was implemented at that time, as well as resizing (not resampling, we wanted to keep the old school looks) the 320x200 parts of the game (videos and low-res-mode in the game itself) to 640x480.
A few more days, and Unavowed got the sound hooked with OpenAL, and it even stopped sounding like 'gzzzzzbzzzzzzmzzzzfzjiiiiiiiiiiiiiiiitbrrrrrrrrrrr!' :)

Well, it was far from being completed – some parts of menu crashed, and also the game crashed when starting the second mission.

Uh, that second bug took as two weeks to sort out! The funniest thing was that it was not in our code, but in the original code. And when we fixed it, and checked the cross-references, we found out that there actually is a flag /g, that could be provided in the command line, that “turned off” this bug. And guess what – there was a BAT script in the original game that looked like this:

@main /w /g

I could say that we wasted some time there, but no.. we learned a thing or two there :)

In the meantime, both Unavowed and me got access to Macs, and we decided to port the game to OSX too.
I would like to say that porting to OSX went smooth and without any problems. Yes, really, I would like to say that, but I can't, since it was terrible.

First of all, OSX has some terribly old binutils version, that didn't like some directives or other syntax figures that we used (e.g., instead of .global keyword, the OSX binutils expected .globl).
Additionally, we found out that the OSX ABI needs the stack to be aligned to 16 bytes on each function entry, which did not appear on other platforms we had Syndicate Wars Port running on.
But even after this, we got really strange crashes in the OSX version, with the execution landing in the middle of instructions. It turned out that the Apple version of the binutils assembler badly compiled loop and loopnz instructions. The target address would always be compiled as a few bytes off of the real target. To fix this, and the problem with unsupported directives, we wrote a filter that replaced modern directives with ancient directives, and replaced loop/loopnz with more instructions that did the same job.
Also, there were some other more or less time consuming issues, but finally, we got it running on OSX too.

SWars Port running on Mac with a bottle of Chanoine in the background


Well, since the reverse-engineering work was done, we started to gather the library licenses, j00ru has written a command line CDDA ripper for us, and Xa has made a cool looking icon and the graphics for the project site. And the ring, er, the project, was forgotten again.

Until a week ago, when we finally decided that a year is enough, and a finished project should be published (maybe someone else also wants to play this game). Of course, during this year both Windows 7 and Mac OSX 10.6 came out, and we found out that there are some minor problems compiling / installing the Port on these systems. Well, actually the Windows one was related to x86-64 mode, not to Windows 7 itself, but I found it out later.

Anyway, the end of this story is known to you – we've finally released it, it can be downloaded, there are some screen shots, there is a video.

At the end, I would like to thank Unavowed for a chance taking part in this project. Additional thanks go to j00ru, MeMeK, oshogbo, Blount, and xa for contributions. Also, I would like to thank joostp for his positive feedback during making of the Port, and Arashi for patience :) Thanks :)

Victory